Improving Sentence Similarity Measurement by Incorporating Sentential Word Importance
نویسندگان
چکیده
Measuring similarity between sentences plays an important role in textual applications such as document summarization and question answering. While various sentence similarity measures have recently been proposed, these measures typically only take into account word importance by virtue of inverse document frequency (IDF) weighting. IDF values are based on global information compiled over a large corpus of documents, and we hypothesise that at the sentence level better performance can be achieved by using a measure of the importance of a word within the sentence that it appears. In this paper we show how the PageRank graph-centrality algorithm can be used to assign a numerical measure of importance to each word in a sentence, and how these values can be incorporated within various sentence similarity measures. Results from applying the measures to a difficult sentence clustering task demonstrates that incorporation of sentential word importance leads to statistically significant improvement in clustering performance as evaluated using a range of external clustering criteria.
منابع مشابه
Improving Lexical Semantics for Sentential Semantics: Modeling Selectional Preference and Similar Words in a Latent Variable Model
Sentence Similarity [SS] computes a similarity score between two sentences. The SS task differs from document level semantics tasks in that it features the sparsity of words in a data unit, i.e. a sentence. Accordingly it is crucial to robustly model each word in a sentence to capture the complete semantic picture of the sentence. In this paper, we hypothesize that by better modeling lexical se...
متن کاملFirst Language Activation during Second Language Lexical Processing in a Sentential Context
Lexicalization-patterns, the way words are mapped onto concepts, differ from one language to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...
متن کاملWord associations are formed incidentally during sentential semantic integration.
Sentential context facilitates the incidental formation of word associations (e.g., Prior, A., & Bentin, S. (2003). Incidental formation of episodic associations: the importance of sentential context. Memory and Cognition, 31(2), 306-316). The present study explored the mechanism of this effect. In two experiments, unrelated word pairs were embedded in coherent or semantically anomalous sentenc...
متن کاملImproving Translation Memory with Word Alignment Information
This paper describes a generalized translation memory system, which takes advantage of sentence level matching, sub-sentential matching, and pattern-based machine translation technologies. All of the three techniques generate translation suggestions with the assistance of word alignment information. For the sentence level matching, the system generates the translation suggestion by modifying th...
متن کاملWord-Order and Lexical-Semantic Factors Influencing Thematic Role Assignment Strategies in Sentence Comprehension
How do people transform the surface structure of a sentence into the sentence’s intended meaning? Word order is an important cue used by people in sentence comprehension (Bates et al., 1982; Ferreira, 2003). Another important cue is the meaning of individual words (i.e. nouns and verbs) which we refer to as “content words”. In these and many approaches in sentence processing studies, the focus ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010